In optimization-based approaches to inverse problems and to statistical estimation, it is common to augment the objective with a regularizer to address challenges associated with ill-posedness. The choice of a suitable regularizer is typically driven by prior domain information and computational considerations. Convex regularizers are attractive as they are endowed with certificates of optimality as well as the toolkit of convex analysis, but exhibit a computational scaling that makes them ill-suited beyond moderate-sized problem instances. On the other hand, nonconvex regularizers can often be deployed at scale, but do not enjoy the certification properties associated with convex regularizers. In this paper, we seek a systematic understanding of the power and the limitations of convex regularization by investigating the following questions: Given a distribution, what are the optimal regularizers, both convex and nonconvex, for data drawn from the distribution? What properties of a data source govern whether it is amenable to convex regularization? We address these questions for the class of continuous and positively homogenous regularizers for which convex and nonconvex regularizers correspond, respectively, to convex bodies and star bodies. By leveraging dual Brunn-Minkowski theory, we show that a radial function derived from a data distribution is the key quantity for identifying optimal regularizers and for assessing the amenability of a data source to convex regularization. Using tools such as $\Gamma$-convergence, we show that our results are robust in the sense that the optimal regularizers for a sample drawn from a distribution converge to their population counterparts as the sample size grows large. Finally, we give generalization guarantees that recover previous results for polyhedral regularizers (i.e., dictionary learning) and lead to new ones for semidefinite regularizers.
translated by 谷歌翻译
We integrate contrastive learning (CL) with adversarial learning to co-optimize the robustness and accuracy of code models. Different from existing works, we show that code obfuscation, a standard code transformation operation, provides novel means to generate complementary `views' of a code that enable us to achieve both robust and accurate code models. To the best of our knowledge, this is the first systematic study to explore and exploit the robustness and accuracy benefits of (multi-view) code obfuscations in code models. Specifically, we first adopt adversarial codes as robustness-promoting views in CL at the self-supervised pre-training phase. This yields improved robustness and transferability for downstream tasks. Next, at the supervised fine-tuning stage, we show that adversarial training with a proper temporally-staggered schedule of adversarial code generation can further improve robustness and accuracy of the pre-trained code model. Built on the above two modules, we develop CLAWSAT, a novel self-supervised learning (SSL) framework for code by integrating $\underline{\textrm{CL}}$ with $\underline{\textrm{a}}$dversarial vie$\underline{\textrm{w}}$s (CLAW) with $\underline{\textrm{s}}$taggered $\underline{\textrm{a}}$dversarial $\underline{\textrm{t}}$raining (SAT). On evaluating three downstream tasks across Python and Java, we show that CLAWSAT consistently yields the best robustness and accuracy ($\textit{e.g.}$ 11$\%$ in robustness and 6$\%$ in accuracy on the code summarization task in Python). We additionally demonstrate the effectiveness of adversarial learning in CLAW by analyzing the characteristics of the loss landscape and interpretability of the pre-trained models.
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
用于培训糖尿病性视网膜病变分类器的公共可用数据是不平衡的。生成的对抗网络可以成功合成视网膜底面图像。为了使合成图像有益,图像必须具有高质量和多样化。目前,使用了几种评估指标来评估生成对抗网络合成的图像的质量和多样性。这项工作是对文献中用于评估生成对抗网络的评估指标的适用性的第一个此类经验评估,用于在糖尿病性视网膜病变的背景下生成视网膜底面图像。 Frechet成立距离,峰值信噪比和余弦距离评估合成增殖性糖尿病夺回图像的质量和多样性的能力。进行定量分析以启用改进的方法,以选择用于增强分类器培训数据集的合成图像。结果表明,Frechet的启动距离适合评估合成图像的多样性,并用于识别该图像是否具有与其类标签相对应的特征。峰值信噪比适合指示合成图像是否具有有效的糖尿病性视网膜病变,并且其特征是否与其类标记相对应。这些结果证明了执行这种经验评估的重要性,尤其是在旨在应用于应用环境中利用的生物医学领域的背景下。
translated by 谷歌翻译
机器学习和认知科学的最新工作表明,了解因果信息对于智力的发展至关重要。使用``Blicket otter''环境的认知科学的广泛文献表明,孩子们擅长多种因果推理和学习。我们建议将该环境适应机器​​学习代理。当前机器学习算法的关键挑战之一是建模和理解因果关系:关于因果关系集的可转移抽象假设。相比之下,即使是幼儿也会自发学习和使用因果关系。在这项工作中,我们提出了一个新的基准 - 一种灵活的环境,可以评估可变因果溢出物下的现有技术 - 并证明许多现有的最新方法在这种环境中概括了困难。该基准的代码和资源可在https://github.com/cannylab/casual_overhypothess上获得。
translated by 谷歌翻译
Neural networks struggle in continual learning settings from catastrophic forgetting: when trials are blocked, new learning can overwrite the learning from previous blocks. Humans learn effectively in these settings, in some cases even showing an advantage of blocking, suggesting the brain contains mechanisms to overcome this problem. Here, we build on previous work and show that neural networks equipped with a mechanism for cognitive control do not exhibit catastrophic forgetting when trials are blocked. We further show an advantage of blocking over interleaving when there is a bias for active maintenance in the control signal, implying a tradeoff between maintenance and the strength of control. Analyses of map-like representations learned by the networks provided additional insights into these mechanisms. Our work highlights the potential of cognitive control to aid continual learning in neural networks, and offers an explanation for the advantage of blocking that has been observed in humans.
translated by 谷歌翻译
In biomedical image analysis, the applicability of deep learning methods is directly impacted by the quantity of image data available. This is due to deep learning models requiring large image datasets to provide high-level performance. Generative Adversarial Networks (GANs) have been widely utilized to address data limitations through the generation of synthetic biomedical images. GANs consist of two models. The generator, a model that learns how to produce synthetic images based on the feedback it receives. The discriminator, a model that classifies an image as synthetic or real and provides feedback to the generator. Throughout the training process, a GAN can experience several technical challenges that impede the generation of suitable synthetic imagery. First, the mode collapse problem whereby the generator either produces an identical image or produces a uniform image from distinct input features. Second, the non-convergence problem whereby the gradient descent optimizer fails to reach a Nash equilibrium. Thirdly, the vanishing gradient problem whereby unstable training behavior occurs due to the discriminator achieving optimal classification performance resulting in no meaningful feedback being provided to the generator. These problems result in the production of synthetic imagery that is blurry, unrealistic, and less diverse. To date, there has been no survey article outlining the impact of these technical challenges in the context of the biomedical imagery domain. This work presents a review and taxonomy based on solutions to the training problems of GANs in the biomedical imaging domain. This survey highlights important challenges and outlines future research directions about the training of GANs in the domain of biomedical imagery.
translated by 谷歌翻译
我们通过与与前面令牌的局部相似度,通过调节从大语料库检索的文档块来增强自动回归语言模型。尽管使用25美元\时分,我们的检索增强型变压器(RetroCro)的检索增强型变压器(RetroCr)对GPT-3和侏罗纪-1获得了可比性的性能。微调后,复古表演转换为下游知识密集型任务,如问题应答。复古结合了冷冻BERT猎犬,一种可微分的编码器和块状的横向机制,以预测基于数量级的令牌,而不是训练期间通常消耗的数量。我们通常从头开始训练复古,还可以快速改造预先接受的变压器,通过检索,仍然达到良好的性能。我们的工作通过以前所未有的规模开辟了通过显式内存改进语言模型的新途径。
translated by 谷歌翻译
贝叶斯脑假设假设大脑根据贝叶斯定理进行准确地运行统计分布。突触前囊泡释放神经递质的随机性失效可以让大脑从网络参数的后部分布中样本,被解释为认知不确定性。尚未显示出先前随机故障可能允许网络从观察到的分布中采样,也称为炼肠或残留不确定性。两个分布的采样使概率推断,高效搜索和创造性或生成问题解决。我们证明,在基于人口码的神经活动的解释下,可以用单独的突触衰竭来表示和对两种类型的分布进行分布。我们首先通过突触故障和横向抑制来定义生物学限制的神经网络和采样方案。在该框架内,我们派生基于辍学的认知不确定性,然后从突触功效证明了允许网络从任意,由接收层表示的分布来释放概率的分析映射。其次,我们的结果导致了本地学习规则,突触将适应其发布概率。我们的结果表明,在生物学限制的网络中,仅使用本地学习的突触失败率,与变分的贝叶斯推断相关的完整贝叶斯推断。
translated by 谷歌翻译
虽然通过简单的因素问题回答,文本理解的大量进展,但更加全面理解话语仍然存在重大挑战。批判性地反映出文本的人将造成好奇心驱动,通常是开放的问题,这反映了对内容的深刻理解,并要求复杂的推理来回答。建立和评估这种类型的话语理解模型的关键挑战是缺乏注释数据,特别是因为找到了这些问题的答案(可能根本不回答),需要高度的注释载荷的高认知负荷。本文提出了一种新的范式,使可扩展的数据收集能够针对新闻文件的理解,通过话语镜头查看这些问题。由此产生的语料库DCQA(疑问回答的话语理解)包括在607名英语文件中的22,430个问题答案对组成。 DCQA以自由形式,开放式问题的形式捕获句子之间的话语和语义链接。在评估集中,我们向问题上的问题提交了来自好奇数据集的问题,我们表明DCQA提供了有价值的监督,以回答开放式问题。我们还在使用现有的问答资源设计预训练方法,并使用合成数据来适应不可批售的问题。
translated by 谷歌翻译